NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Simulated Infectious Diseases Datasets with Controlled Data Bias

https://doi.org/10.1145/3711896.3737401

Kong, Ruochen; Anderson, Taylor; Scotch, Matthew; Heslop, David J; Khaokaew, Yonchanok; Xue, Hao; Xiong, Li; MacIntyre, Chandini Raina; Salim, Flora D; Züfle, Andreas (August 2025, ACM)

Free, publicly-accessible full text available August 3, 2026
Urban Anomalies: A Simulated Human Mobility Dataset with Injected Anomalies

https://doi.org/10.1145/3681765.3698459

Amiri, Hossein; Kong, Ruochen; Züfle, Andreas (October 2024, ACM)

Human mobility anomaly detection based on location is essential in areas such as public health, safety, welfare, and urban planning. Developing models and approaches for location-based anomaly detection requires a comprehensive dataset. However, privacy concerns and the absence of ground truth hinder the availability of publicly available datasets. With this paper, we provide extensive simulated human mobility datasets featuring various anomaly types created using an existing Urban Patterns of Life Simulation. To create these datasets, we inject changes in the logic of individual agents to change their behavior. Specifically, we create four of anomalous agent behavior by (1) changing the agents’ appetite (causing agents to have meals more frequently), (2) changing their group of interest (causing agents to interact with different agents from another group). (3) changing their social place selection (causing agents to visit different recreational places) and (4) changing their work schedule (causing agents to skip work), For each type of anomaly, we use three degrees of behavioral change to tune the difficulty of detecting the anomalous agents. To select agents to inject anomalous behavior into, we employ three methods: (1) Random selection using a centralized manipulation mechanism, (2) Spread based selection using an infectious disease model, and (3) through exposure of agents to a specific location. All datasets are split into normal and anomalous phases. The normal phase, which can be used for training models of normalcy, exhibits no anomalous behavior. The anomalous phase, which can be used for testing for anomalous detection algorithm, includes ground truth labels that indicate, for each five-minute simulation step, which agents are anomalous at that time. Datasets are generated using the maps (roads and buildings) for Atlanta and Berlin having 1k agents in each simulation. All datasets are openly available at https://osf.io/dg6t3/. Additionally, we provide instructions to regenerate the data for other locations and numbers of agents.
more » « less
Full Text Available
An Infectious Disease Spread Simulation to Control Data Bias

https://doi.org/10.1145/3678717.3691293

Kong, Ruochen; Anderson, Taylor; Heslop, David; Zufle, Andreas (October 2024, ACM)

The increased availability of datasets during the COVID-19 pandemic enabled machine-learning approaches for modeling and forecasting infectious diseases. However, such approaches are known to amplify the bias in the data they are trained on. Bias in such input data like clinical case data for COVID-19 is difficult to measure due to disparities in testing availability, reporting standards, and healthcare access among different populations and regions. Furthermore, the way such biases may propagate through the modeling pipeline to decision-making is relatively unknown. Therefore, we present a system that leverages a highly detailed agent-based model (ABM) of infectious disease spread in a city to simulate the collection of biased clinical case data where the bias is known. Our system allows users to load either a pre-selected region or select their own (using OpenStreetMap data for the environment and census data for the population), specify population and infectious disease parameters, and the degree(s) to which different populations will be overrepresented or underrepresented in the case data. In addition to the system, we provide a large number of benchmark datasets that produce case data at different levels of bias for different regions. Wehope that infectious disease modelers will use these datasets to investigate how well their models are robust to data bias or whether their model is overfit to biased data.
more » « less
Full Text Available
Human Mobility Challenge: Are Transformers Effective for Human Mobility Prediction?

https://doi.org/10.1145/3681771.3700130

Kong, Ruochen; Amiri, Hossein; Liu, Yueyang; Kennedy, Lance; Gupta, Misha; Kim, Joon-Seok; Züfle, Andreas (October 2024, ACM)

Transformer-based models are popular for time series forecasting and spatiotemporal prediction due to their ability to infer semantic correlations in long sequences. However, for human mobility prediction, temporal correlations, such as location patterns at the same time on previous days or weeks, are essential. While positional encodings help retain order, the self-attention mechanism causes a loss of temporal detail. To validate this claim, we used a simple approach in the 2nd ACM SIGSPATIAL Human Mobility Prediction Challenge, predicting locations based on past patterns weighted by reliability scores for missing data. Our simple approach was among the top 10 competitors and significantly outperformed the Transformer-based model that won the 2023 challenge.
more » « less
Full Text Available
Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease Spread

https://doi.org/10.1145/3660631

Züfle, Andreas; Salim, Flora; Anderson, Taylor; Scotch, Matthew; Xiong, Li; Sokol, Kacper; Xue, Hao; Kong, Ruochen; Heslop, David; Paik, Hye-Young; et al (June 2024, ACM Transactions on Spatial Algorithms and Systems)

The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground-truth data can be obtained. From this complete dataset—which may not reflect the real world—we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions.
more » « less
Full Text Available

Search for: All records